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Abstract 

The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. 'Francesco' was determined using 
a combination of different new-generation multiplex sequencing platforms. The total length of the non- 
redundant sequences was 568 887 31 5 bp, consisting of 45 088 scaffolds, which covered 91% of the 
622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 1 6 
644 bp and 60 73 7 bp, respectively, and the longest scaffold was 1 287 1 44 bp. The average GC content of 
the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and 
miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 
266 complete and partial gene structures excluding those in transposable elements were deduced. Gene 
coverage was ~98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization 
of the assigned carnation genes and comparison with those of other plant species revealed characteristic fea- 
tures of the carnation genome. The results of this study will serve as a valuable resource for fundamental and 
applied research of carnation, especially for breeding new carnation varieties. Further information on the 
genomic sequences is available at http://carnation.kazusa.or.jp. 
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1. Introduction 

Carnation (Dianthus caryophyllus L.) is one of the 
major floricultural crops in Japan and worldwide. It is 
a member of the family Caryophyllaceae and belongs 
to the genus Dianthus. More than 300 Dianthus 
species have been recorded. 1 Many Dianthus species 
are distributed throughout Europe and Asia, and the 
distribution of this genus extends to arctic North 
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America and to mountainous sites in Africa. 2 Several 
species, including D. caryophyllus, D. barbatus, D. chi- 
nensis, D. plumarius, D. superbus and their hybrids are 
widely used as horticultural cultivars. 3 Many new car- 
nations have been bred for attractive characteristics 
such as flower colour, flower size, fragrance and flower 
longevity. 

The pigments in carnation flowers are mainly antho- 
cyanin and chalcone derivatives, and most of the genes 
involved in pigment biosynthesis in carnation have been 
identified. 4 Due to the absence of flavonoid 3',5'- 
hydroxylase (F3',5'H; a key enzyme in the synthesis of 
delphinidin) in carnation, blue or violet flowers have 
never occurred in carnations. The introduction of a 
petunia or pansy F3',5'H gene into carnation has led 
to the creation of blue or violet transgenic carnations, 
which arecommerciallyavailable. 5 The plant pigments 
of species belonging to the families of Caryophyllales 
(except for Caryophyllaceae and Molluginaceae) are 
betalains, which have never been detected with antho- 
cyanins in the same species. 6 The carnation, exception- 
ally bearing anthocyanins in Caryophyllales, is one of 
the attractive materials to study evolution of genetic 
systems for pigment synthesis. 

The vase life of cut flowers, or flower longevity, is one 
of the most important characteristics to carnation. 7 
Carnation flowers are highly sensitive to ethylene, 
which induces autocatalytic ethylene production and 
wilting in carnation petals. 8 Conventional cross-breeding 
techniques have succeeded in improving the vase life of 
the carnation flower, 9 which is a polygenic trait that is 
controlled by several genes involved in ethylene pro- 
duction and ethylene sensitivity. 9,1 0 

To clarify the genetic and physiological mechanisms of 
agriculturally important traits, and to apply this 
information to actual breeding, a number of genetic 
and molecular tools have been developed. Genetic 
linkage maps of the carnation genome have been con- 
structed and used to identify quantitative trait loci 
(QTL) responsible for resistance to carnation bacterial 
wilt. 11,12 With the aid of next-generation sequencing 
(NGS) technology, large-scale transcriptome analysis 
(RNA-seq) has been conducted, revealing 300 740 uni- 
genes consisting of 37 844 contigs and 262 896 single- 
tons. 13 Recently, we constructed a reference genetic 
linkage map for carnation using simple sequence repeat 
(SSR) markers derived from this RNA-seq analysis. 14 

Most carnation cultivars are diploid, with a chro- 
mosome number of 2n = 2x=30. 15 The reported 
nuclear DNA content of carnation is 1 .23-1 .48 pg/2C, 
which indicates that the carnation has a compara- 
tively small nuclear genome approximately four 
times the size of the Arabidopsis thaliana nuclear 
genome. 16 The estimated genome size of carnation 
(670 Mb) 16 is small compared with those of other or- 
namental flowers, such as Rosa hybrida (1.1Gb), 



Antirrhinum majus (1 .5 Gb), Petunia hybrida (1 .6 Gb), 
Chrysanthemum morifolium (9.4 Gb) and Tulipa gesneri- 
ana (26 Gb), according to the Plant C-values database 
(http://data.kew.org/cvalues/). To understand the 
genetic systems of carnation and to accelerate the 
process of molecular breeding, we performed structural 
analysis of the whole genome of carnation for the first 
time in ornamentals. The information and material 
resources for the carnation genome generated in this 
study should enhance both fundamental and applied 
studies of carnations and related plants. 

2. Materials and methods 

2.1. Plant materials 

The carnation cultivars, 'Francesco' (for genome 
sequencing) and 'Karen Rouge' (for BAC construction) 
were grown under natural daylight conditions in a 
greenhouse in NIFS. 'Francesco', a red Mediterranean 
standard-type cultivar, is the leading cultivar in 
Japan, 17 and 'Karen Rouge' is a cultivar with 
bacterial wilt resistance derived from D. capitatus ssp. 
andrzejowskianus} 8 

2.2. Construction of BAC libraries and BAC DNA 
sequencing 

BAC libraries were constructed from nuclear DNA pre- 
pared from young leaves of 'Karen Rouge'. Nuclear DNA 
was partially digested with Hmdlll and size-selected, 
and 1 00-1 80 kb DNA was ligated to the BAC vector 
plndigoBAC5 (Epicentre Biotechnologies, Wl, USA) and 
introduced into Escherichia coli ElectroMAX DH10B 
cells (Life Technologies Co., CA, USA) byelectroporation. 

For shotgun sequencing of BAC clones, BAC DNAs pre- 
pared according to the standard procedure were frag- 
mented by nebulization, barcoded with a GS Titanium 
Rapid Library MID Adaptors Kit (Roche Diagnostics, 
IN, USA), and pooled for sequencing using the GS 
Titanium platform (Roche Diagnostics) according to 
the manufacturer's instructions. 

2.3. Shotgun sequencing of the carnation genome 
Whole-genome shotgun sequencing of the cultivar 

'Francesco' was performed using both HiSeq 1 000 
(lllumina Inc., CA, USA) and GS FLX+ (Roche 
Diagnostics) sequencers. Genomic DNA extracted 
from leaves was used for library construction according 
tostandard protocols. The libraries included paired-end 
(PE) (insert size: 500 bp) and overlapping fragment 
(OF) (insert size: 1 80 bp) libraries for the HiSeq 1 000 
sequencer, and single-end (SE) and PE libraries (insert 
size: 4 kb) for the GS FLX+ system. In addition, two 
lllumina mate-pair (MP) libraries with 3 and 5 kb 
inserts, respectively, were constructed with GS 
Titanium Library Paired End Adaptors (Roche 
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Diagnostics) as previously described. 19,20 The sequence 
data collection issummarized in Supplementary Fig. SI . 

2.4. Sequence assembly and evaluation of authenticity 
A method taken for assembly of the genomic se- 
quences of carnation issummarized in Supplementary 
Fig. S1. Sequence assembly for the BAC DNAs was 
performed using Newbler ver. 2.7 (Roche 
Diagnostics). Authenticity of the assembled genomic 
sequences described above was examined by aligning 
the sequences of subcontigs contained in the scaffolds 
of the genomic assemblies with the contig sequences 
of the BAC clones using the Bowtie 2 program. 
References for computer programs and databases are 
listed in Supplementary Table S1 . 

2.5. cDNA sequences 

Carnation cDNA sequences generated by Sanger 
sequencing (accession numbers: FY382825- 
FY405424) 12 and GS FLX+ sequencing (accession 
numbers: FX296474-FX33431 7) 13 were retrieved 
from NCBI GenBank (http://www.ncbi.nlm.nih.gov/ 
genbank/). In order to generate a non-redundant 
cDNA data set, redundant cDNA sequences were 
removed with a CD-HIT tool. 

2.6. Detection of repetitive sequences 

Known repetitive sequences, including transposable 
elements (TEs), were detected with the RepeatMasker 
(http://www.repeatmasker.org) and TransposonPSI 
(http://transposonpsi.sourceforge.net) programs, and 
novel repeats were detected with RepeatScout and 
Piler. To exclude protein-coding genes, the novel 
repeat library was searched against the SWISS-PROT 
database (http://www.uniprot.org) with BLASTX. The 
known and novel repeat sets were merged and redun- 
dant repeats were removed. 

2.7. Assignment ofRNA-coding genes 

Genes for tRNAs were assigned using the tRNAscan- 
SE program. The rRNA genes were identified based on 
sequence similarity with those of A. thaliana. Genes 
for small nucleolar RNA(snoRNA) were predicted using 
snoScan. Micro RNA (miRNA) genes were searched 
against a miRBase library that contained plant miRNA 
sequences detected by the MapMi program, which 
assigns miRNA precursor sequences in genomes by a 
combination of sequence alignment with Bowtie and 
prediction of RNA-secondary structures with RNAfold. 
References for computer programs and databases are 
listed in Supplementary Table S1 . 

2.8. Assignment of protein-encoding genes 

Two programs were used for the assignment of 
protein-encoding genes in the carnation genome: 



PASA (http://pasa.sourceforge.net), based on cDNA 
alignment and Augustus, based on ab initio gene predic- 
tion incorporating cDNA alignment information. The 
protein-encoding genes were first predicted using 
PASA and then, Augustus was trained with a dataset 
comprising 300 PASA- predicted genes likely containing 
complete coding regions. The trained Augustus was 
implemented with a hint file, which was generated by 
alignment of the cDNA set to the genome with BLAT. 
Finally, both the PASA- and Augustus-predicted datasets 
were merged, and only genes with a single exon pre- 
dicted in both datasets, not in either one of the two 
datasets, were selected becausethe prediction accuracy 
of genes with a single exon is generally low. 

To deduce coding regions of the predicted genes, 
all possible amino acid sequences translated from 
three reading frames for multiple exon genes and six 
reading frames for single exon genes were similarity- 
searched against the Uniprot-Tremble database (http 
://www.ebi.ac.uk/uniprot/) with BLASTP. Translated 
amino acid sequences that had a similarity to a 
protein in the database with E-value < 1 £-5, 
identity > 30% and minimal length > 50 amino acids 
were selected and defined as the coding regions. 
When all translated sequences of a gene exhibited no 
similar proteins, the longest coding region was selected. 
To confirm the coverage of the assembly and the accur- 
acy of the annotated genes, core eukaryotic genes were 
mapped using CEGMA. References for computer pro- 
grams and databases are listed in Supplementary 
Table S1. The predicted genes related to TEs were 
excluded for further analyses. 

2.9. Comparison of metabolic pathways 

For comparison of the metabolic pathway, Beta vul- 
garis, A. thaliana and Oryza sativa were chosen. B. vul- 
garis, which has numerous cultivated root vegetables 
such as table beet and sugar beet, is a member of 
the same order Caryophyllales with carnation and has 
relatively large number of registered genes among this 
order. Arabidopsis thaliana and O. sativa are typical 
models of dicot and monocot, respectively. The nucleo- 
tide sequences of gene repertoires of A. thaliana 
and O. sativa were retrieved from the genome data- 
bases of TAIR10 and IRGSP 1.0, respectively. For 
B. vulgaris, expressed sequence tag (EST) sequences 
were obtained from dbEST of the NCBI databases 
and trimmed using the CROSS_MATCH program for 
vector sequences provided in CROSS_MATCH and 
NCBI's Univec (http://www.ncbi.nlm.nih.gov/tools/ 
vecscreen/univec/) reads longer than 1 00 bp were 
subjected to assembly by PHRED with default para- 
meters. The sequences of genes and unigenes thus 
obtained, as well as those of the predicted carnation 
genes, were mapped ontothe KEGG reference pathways 
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by BLAST searches against genes in the KEGG database 
with £-value cutoff of 1 £-1 0, length coverage >25% 
and identity > 50%, and the status of mapping wascom- 
pared among the four plant species. References for 
computer programs and databases are listed in 
Supplementary Table S1 . 

2. 1 0. Functional classification of genes 

The predicted gene sequences of carnation were 
searched against NCBI's Clusters of Orthologous 
Groups of proteins (KOG) by BLAST searches with 
£-value cutoff of 1 £-20. In addition, functional domains 
located in the translated sequences in genes and uni- 
genes were searched against the InterPro databases 
using InterProScan, and the detected domains were 
further classified into plant GO slim categories using 
the map2slim program. 



3. Results and discussion 

3.1 . Sequencing the carnation genome 

3.1.1. Shotgun sequencing and assembly of the 
carnation genome Shotgun sequencing of 
the genome of the carnation cv. 'Francesco' was 
carried out using a com bi nation of different sequencing 
NbrariesfortwoNGSs,theHiSeq 1 000andtheGSFLX+ 
systems. In the HiSeq 1 000 system, a total of 1 277.4 
million (M), 1 526.5, 442.6 and 475.3 M reads corre- 
spondingto 1 27.7, 1 52.6,44.3 and 47.5 Gbsequence 
data were collected from the PE, OF, 3 kb MP and 5 kb 
MP libraries, respectively (Supplementary Fig. S1). In 
parallel, 5.9 M single reads (mean length: 663 bases) 
and 1.3 M PE reads (mean length: 436 bases) corre- 
sponding to 3.9 Gb and 589 Mb sequence data, re- 
spectively, were obtained using the GS FLX+ system 
(Supplementary Fig. S1 ). The genome size of carnation 
cv. 'Francesco', which was estimated by k-mer analysis 21 
based on the HiSeq 1 000 sequence data, was 622 Mb, 
which is 93% of the previous estimate (670 Mb) 16 
this value was adopted for subsequent analyses. 
Total redundancy of the obtained sequence data 
(376.6 Gb) was equivalent to ~604-times the esti- 
mated genome size. 

The method used for genomic data assembly is 
described in the Materials and methods and sum- 
marized in Supplementary Fig. S1 and Table S1. The 
total length of the resulting genomic assemblies was 
568.9 Mb, equivalent to 91 %of the estimated genome 
size, containing 69 Mb gaps filled by N; the N50 values 
of the contigs and scaffolds were 1 6 644 bp and 60 
73 7 bp, respectively (Table 1 ). 

The N50 value of the final scaffolds was relatively 
low, probably due to the heterozygotic nature (~0.2% 
heterozygosity estimated in this study; data not 



Table 1. Statistics of D. caryophyllus cv. 'Francesco' genome 
assemblies 





Limited length 3 


Contigs b 


Scaffolds 


Total number 


>1 00 bp 


88 654 


45 088 


Total length (kb) 




50 021 8 


568 887 


N50 (bp) 




1 6 644 


60 737 


N90 (bp) 




2036 


6700 


Total number 


>1 kb 


64 1 59 


30 71 6 


Total length (Mb) 




483 703 


558 51 8 


N50 (bp) 




1 7 553 


62 61 6 


N90 (bp) 




2678 


8036 


Maximum length (kb) 




363 


1 287 


GC content (%) 




36.3 





a Contigs or scaffolds shorter than the indicated length were 
excluded from the statistics. 

b Subcontigs in scaffolds, which were split by gap regions with 
length >4 bp. 



shown) of the carnation genome. Because current 
methods for de novo assembly using de Bruijn graphs 
split assemblies at heterologous polymorphic sites, as- 
semblies of heterozygous diploid genomes tend to be 
fragmented. However, the N50 value of the contigs 
contained in the scaffolds was 1 6.6 kb (Table 1 ), 
which allowed reliable characterization and gene 
annotations. Mapping of 248 core eukaryotic genes 
by the CEGMA program indicated that 96% of the 
core genes were completely covered in the genome as- 
semblies. Comparison of the independently deter- 
mined sequences of the two BAC clones with the 
assembled genomic sequences showed perfect align- 
ment with correct order and coverage, demonstrating 
that the coverage and quality of the assembled 
genomic sequences were high. 

3.1.2. Correlation of the genomic sequences with a 
genetic linkage map We constructed an 
SSR-based reference genetic linkage map of the carna- 
tion genome comprising 412 SSR loci on a total 
length of 969.6 cM. 14 To correlate the genomic 
sequences obtained in this study to their positions on 
this linkage map, we searched the assembled genomic 
sequences for sequences of these markersand theirad- 
jacent regions using the BLASTN program. All primer 
sequences, sequences of the flanking regions and SSR 
motifs were successfully mapped to the assembled 
genomic sequences, although there were single base 
substitutions or small deletions of several bases long. 
Single corresponding scaffolds could be identified for 
378 (91.7%) of the 412 SSR loci (Supplementary 
Table S2), and the remaining SSR loci were assigned to 
multiple scaffolds containing identical or highly 
similar sequences. Consequently, 2 68 scaffolds could 
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be located on the genetic linkage map. Some of the scaf- 
folds covered multiple marker loci, e.g. scaffold6 on LG 
85P_5, with six SSR loci. The longest mapped scaffold 
was scaffold2 on LG 85P_9, which covered 1.2 Mb 
regions and three marker loci. The total length of the 
mapped scaffolds was 51 .4 Mb, equivalent to 8.3% of 
the estimated genome size. 

3.2. Characteristic features of the carnation genome 

3.2.1. Repetitive sequences The repetitive se- 
quences found in the assembled genomic sequences 
comprised 1 1 2 078 known TEs, 354 221 simple 
repeats and low complexity sequences, and 331 831 
novel repeats defined by de novo repeat finding, the 
sum of which correspond to 33% of the assembled 
genomic sequences (Table 2). Comparison of the TE 
contents of the carnation genome with those of other 
plant species, such as A. thaliana, Brassica rapa, potato, 
tomato, cotton, soybean, sweet orange, rice, foxtail 
millet and sorghum (Supplementary Table S3), 
showed that the relative content of known TEs was 
lower than those of unclassified repeats, and simple 
repeats appeared to be higher in carnation. The lower 
content of TEs in the assembled genomic sequences in 



Table 2. Repetitive sequences identified in the carnation genome 



Repeat class 


Number 


Total 
length 

(kb) 


Genome 
content 

(°/o) a 


Repeat 
composition 

(%) 


Class 1 TE 










LTR 


86 1 1 1 


36 558 


7.3 


22.1 


SINE 


441 


35.7 


0.007 


0.02 


LINE 


7362 


3447 


0.69 


2.1 


Other 


1 0 


1 .7 


0.0003 


0 


Class 2 TE 










DNA 

transposon 


1 6 729 


3914 


0.78 


2.4 


Rolling 
circle 


1 425 


535 


0.1 1 


0.33 


Other type TE 


48 


21 .1 


0.004 


0.01 


Simple repeat 


295 621 


1 6 847 


3.4 


10.3 


Low 

complexity 


58 600 


3528 


0.71 


2.2 


Other repeat 


41 1 


65 


0.004 


0.01 


Unclassified 

(novel) 

repeat 


331 831 


100 097 


20 


60.6 


Total 


794 297 


165027 


33 


100 



LTR, long terminal repeat; SINE, short interspersed nuclear 
element; LINE, long interspersed nuclear element. 
Percentage of total length of repeats in the continuous 
sequences in the assembled genome. 



carnation may be attributed to the escape of TE 
sequences during the process of sequence assembly, 
since the carnation assemblies in this study were 
highly fragmented at the positions of potential repeti- 
tive sequences. 

In carnation, the mechanism underlying variegated 
f lowercolour has long been of major horticultural inter- 
est. 22 An excision event of Class II DNA TEs has been 
identified from the genes for enzymes involved in 
anthocyanin biosynthesis. 23 Tic101,a memberof hAT 
(Ac/Ds) elements, was identified as an autonomous 
TE encoding an active transposase protein. 4 A single 
homologue of Tdicl 01 , designated Tic1 04, whose se- 
quence is 99% (Supplementary Table S4) identical to 
that of Tdid 01 was detected in the assembled 
genomic sequences. 4 Tic 1 04 hasone nucleotidesubsti- 
tution in a terminal-inverted repeat sequence and 
another in the coding region of the transposase gene 
to generate a stop codon (Supplementary Fig. S2). The 
CACTA element (En/Spm), dTad was first identified 
in ageneforglutathione-S-transferase, here designated 
Tad 07, which is not likely to encode notable proteins 
for transposases. 24 A similarity search against the 
'Francesco' genomic sequences detected four types of 
CACTA elements, designated Tac201 , Tac301 , Tac401 
and Tac501, in addition to Tad 01 (Supplementary 
Table S4 and Fig. S3). None of these is likely to encode 
intact transposases that are active for transposition. 

The genome of 'Francesco' contains an acyl-glucose- 
dependent anthocyanin 5-glucosyltransferase (AA5GT) 
gene with an insertion by Ty1-1 (Tyldid), resulting 
in synthesis and accumulation of pelargonidin 3-0- 
malylglucoside that lacks its glucose moiety at the 5 
positions. Ty1-1 in the 'Francesco' genome has one 
nucleotide substitution in the coding region of the 
transposase gene to generate a stop codon different 
from Tyldid . Similarity searching of the assembled 
genomic sequences indicated that there were six 
genes for Ty1-1 longer than 1 kb (Supplementary 
Table S4 and Fig. S4). 

3.2.2. Genes coding for RNAs Thirteen genes for 
rRNAsand 1 050 intact genes for tRNAs were identified 
in the assembled genomic sequences. Comparison of 
the number of genes for tRNAs in the genomes of 1 6 
plant species indicated that the carnation genome con- 
tains many more genes than other plant species 
(Supplementary Table S5). A total of 92 and 143 
genes for snoRNA and miRNA, respectively, were also 
assigned. 

3.2.3. Prediction of protein-encoding genes We 
assigned protein-encoding genes in the carnation 
genome using two types of computer programs, PASA 
and Augustus. As a result, 10 519 protein-encoding 
genes predicted by PASA and 99 1 23 genes predicted 
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by Augustus were merged, resulting in assignmentof 56 
137 protein-encoding genes including those in TEs 
(Supplementary Table S6). 

Asimilarity search againstthe Uniprot -Tremble data- 
base indicated that the translated amino acid se- 
quences of the 42 047 predicted protein-encoding 
genes showed significant sequence similarity to 
the registered genes, and 1 2 871 were similar to 
TEs. Consequently, 43 266 protein-encoding genes ex- 
cluding those TEs were assigned in the assembled 
genomic sequences, ~72% of which showed sequence 
similarity to registered genes (Supplementary Table 
S6). It is possible that the gene number was overesti- 
mated due to fragmentation of the assembled 
sequences and the heterozygotic nature of the genome. 

To estimate the gene coverage of the assembled 
genomic sequences and the accuracy of our gene pre- 
diction, we mapped core eukaryotic genes onto the 
assembled genomic sequences. Of the 248 core 
eukaryotic genes, 238 (96%) matched the entire coding 
regions, which increased to 242 (98%) if partial 
matches were included. 



3.3. Comparison of carnation gene repertoire with 
those of other plant species 
To compare metabolic pathway genes, we examined 
the genes of B. vulgaris, A. thaliana and O. sativa. Since 
the genomic sequences of B. vulgaris are not yet avail- 
able, 29 830 EST sequences of B. vulgaris retrieved 
from dbEST were assembled into 14 058 unigenes 
consisting of 581 0 contigs and 8248 singletons (total 
size: 9 781 432 bp) and used for comparison. For 

A. thaliana and O. sativa, the complete gene sets, 35 
386 and 42 136 genes, respectively, were retrieved 
from theTAIRI 0 and IRGSP 1 .0 databases, respectively. 

The translated amino acid sequences of 43 491 
coding sequences (CDSs) including splicing variants in 
carnation were searched against the unigenes of 

B. vulgaris and the complete gene sets of A. thaliana 
(TAIR10) and O. sativa (IGRSP 1.0) using BLAST with 
E-value cutoff of 1 £-1 0. The distributions of the per- 
centage of amino acid sequence identities are listed in 
Supplementary Fig. S5. The degree of similarity was in 
orders, vulgaris, A. thaliana and O. sativa, which iscon- 
sistent with their phylogenetic relationship. 

The unigenes of B. vulgaris and the genes of A. thali- 
ana (TAIR1 0) and O. sativa (IRGSP 1 .0) were classified 
into KOG categories (Fig. 1 ). The number of unigenes 
classified into KOG was 6937 (49.3%) in B. vulgaris, 
and the number of genes classified into KOG was 
19 005 (43.7%), 18 250 (51. 6%), and 18 065 (42.9%) 
in carnation, A. thaliana, and O. sativa, respectively. 
The ratio of the genes in KOG Q (Secondary metabolites 
biosynthesis, transport and catabolism) was relatively 
high in carnation. For classification of the genes based 



on GO slim, on the other hand, the number of unigenes 
classified into GO category was 5676 (40.4%) in 
B. vulgaris, and that of genes classified into GO category 
was 16423 (37.8%), 21 875 (61.8%) and 20 203 
(47.9%) in carnation, A. thaliana and O. sativa, respect- 
ively. This result indicated that the genes in the categor- 
ies 'nucleobase, nucleoside, nucleotide and nucleic acid 
metabolic process (Biological Process: BP)', 'cell wall 
(Cellular Component: CC)' and 'transferase activity 
(Molecular Function: MF)' were relatively high in carna- 
tion (Supplementary Fig. S6). 

The unigenes of B. vulgaris and the complete gene 
sets comprising 35 386 and 42 136 genes of 
A. thaliana (TAIR1 0) and O. sativa (IGRSP 1 .0), respect- 
ively, were mapped onto KEGG reference pathways. As a 
result, 1 1 030 of 43 491 translated amino acid se- 
quences of CDSs in carnation, 1 3 979 of 1 4 058 uni- 
genes in B. vulgaris, 13 154 of 35 386 genes in 
A. thaliana (TAIR1 0) and 1 2 082 of 42 136 in O. 
sativa (IGRSP 1 .0) were successfully mapped onto the 
KEGG reference pathways (Supplementary Table S7). 
The pathways including the genes mapped only in 
carnation are as follows: 'Pentose phosphate pathway', 
Galactose metabolism', 'Ether lipid metabolism', 
Alanine, aspartate and glutamate metabolism', Glycine, 
serine and threonine metabolism', 'Cysteine and me- 
thionine metabolism', 'Metabolism of xenobiotics by 
cytochrome P450' and several others. 

3.4. Genes characteristic of carnation 
3.4.1 . Genes for enzymes involved in phenylpropanoid 
biosynthetic pathways The phenylpropa- 
noid biosynthetic pathway that begins with phenyl- 
alanine and results in the production of anthocyanin, 
flavonoid and lignin is one of the most well-studied 
secondary metabolic pathways (Supplementary Fig. 
S7). Similarity searches and phylogenetic analyses 25 
for genes known to be involved in the synthesis of 
phenylpropanoid, flavonoid and lignin against the 
assembled genomicsequencesof carnation was perfor- 
med, andthe resultsare summarized in Supplementary 
Table S8. 

Similarity searches detected a single possible flavon- 
oid 3'-hydroxylase (F3'H) homologue among 241 
genes homologous to cytochrome P450 in the genome 
of carnation cv. 'Francesco'. The F3'H gene in cv. 'Mrs. 
Purple' is known to code for an active protein of 51 4 
amino acid residues. 4 By contrast, the F3'H homologue 
in cv. 'Francesco' contains one extra guanidine nucleo- 
tide in the potential coding region, resulting in the 
production of a truncated 1 93 amino acid peptide. 

In addition, a total of 84, 25, 120, 93 and 139 
putative homologues of glutathione S-transfe rases, 
multidrug and toxic compound extrusion (MATE)-type 
transporter, UDP-sugar dependent glycosyltransferase, 
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Myb transcription factors and bHLH transcription 
factors, which are known to be involved in the phenyl- 
propanoid biosynthetic pathway, were assigned in the 
assembled genomic sequences. Details of these genes 
as well as the results of phylogenetic analysis are 
shown in Supplementary Figs S8-S1 2. 

3.4.2. Genes involved in betalain synthesis Betalains 
are commonly synthesized instead of anthocyanin in 
the order Caryophyllales, except for two families, 
Caryophyllaceae and Molluginaceae. Although carna- 
tion is classified under the family Caryophillaceae and 
produces anthocyanin as the main plant pigment in 
petals, and often in other organs, carnation may also 
contain the genes required to produce betalains. 
Three characteristic enzymes are involved in synthesis 
of betalains, i.e. L-dihydroxyphenylalanine (DOPA) 
4,5-dioxygenase (DOD), 6,26 cytochrome P450 mono- 
oxygenase (CYP76AD) 27 and cyclo-DOPA glucosyl- 
transferase (cD5GT), in the UDP-dependent 
glycosyltransferase (UGT) family. 28,29 

One copy of a DOD homologue (Dca8668) was 
found in thecarnation genomicsequences bysimilarity 
searching. Multiple alignments revealed that the amino 
acid sequence of the carnation DOD homologue 
contains the conserved motif typical of non-betalain- 
producing plants (Supplementary Fig. S1 3). This result 
suggests that the carnation DOD homologue does not 
catalyze the formation of betalamic acid. 

CYP76AD belongs to the cytochrome P450 monoox- 
ygenase(CYP) family, which is one of the most divergent 
families in higher plants, 27 with 241 homologues of 



CYP detected in the carnation genomic sequences. 
Dca32662 showed the highest amino acid identity 
(66.4%) with that of B. vulgaris CTP76AD1 . However, 
the fact that this protein belongs to CYP76C implied 
that carnation genome does not encode betalain- 
related CYP family protein. A single homologue of 
Mirabilis jalapa cD5GT (MjcD5GT) gene was found. 
The deduced amino acid sequence of this UGT 
homolog showed 58.4% identity to MjcD5GT and 
62.5% to Celosia cristata cD5GT. 

3.4.3. Genes involved in chlorophyll and carotenoid 
synthesis Similarity searches against the 
assembled genomic sequences in carnation identified 
all of the genes involved in the chlorophyll metabolic 
pathway in the carnation genome (Supplementary 
Table S9). We found multiple genes encoding putative 
isozymes, including glutamate-1 -semialdehyde 2,1- 
aminotransferase (GSA), 5-aminolevulinate dehydro- 
genase (HEMS), porphobilinogen deaminase (HEMC), 
uroporphyrinogen III decarboxylase (HEME), magne- 
sium chelatase Dsubunit (CHLD), Mg-protoporphyrin 
IX methyltransferase (CHLM), Mg-protoporphyrin IX 
monomethylester cyclase (CRD), protochlorophyllide 
oxidoreductase (PORA) and divinyl chlorophyllide a 8- 
vinyl reductase (DVR), which are involved in chlorophyll 
synthesis in carnation. By contrast, all of the enzymes 
involved in the chlorophyll cycle and chlorophyll deg- 
radation are likely to be encoded by a single gene. 
STAY-GREEN (SGR), a protein involved in senescence- 
related chlorophyll degradation, is encoded by three 
genes. 
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Similarity searches revealed that most of theenzymes 
involved in carotenoid biosynthesis are encoded by 
single genes in the carnation genome (Supplementary 
Table S1 0). In addition, two homologues of 9-c/s- 
epoxycarotenoid dioxygenase (NCED) genes involved 
in abscisic acid biosynthesis in A. thaliana, 30 two homo- 
logs of carotenoid cleavage dioxygenase (CCD4) genes 
as well as single homologues each of CCD1 , CCD7 
and CCD8, genes related to different enzyme activi- 
ties and substrate specificities, 30 were assigned 
(Supplementary Fig. S1 4). No ESTs encoding CCDs were 
found in the carnation EST dataset (Supplementary 
Table S1 0), 13,31 suggesting that their expression levels 
are low. 

3.4.4. Disease resistance genes We searched for 
nucleotide-binding site-leucine-rich repeat (NBS-LRR) 
genes in the assembled genomic sequences of car- 
nation and assigned 217 NBS-containing potential 
Resistance (R) genes (Supplementary Table S1 1 ). The 
number of NBS-LRR genes in carnation is larger than 
that of cucumber, 3 melon 33 and papaya, 34 but 
smaller than that of tomato, 35 grape 36 and rice. 37 In 
this study, only three potential genes with Toll/interleu- 
kin-1 receptor (TIR) domains at the N-terminal were 
identified, two of which lack LRRs, while 69 genes con- 
taining a coiled-coil motif were assigned, 52 of which 
lack LRRs. This result is consistent with the previous 
observation that the genome of B. vulgaris lacks TIR- 
type resistance genes despite the fact that this plant 
is a dicot. 38 Our present results support the assump- 
tion that the loss of TIR-type resistance genes is not 
restricted to cereals or monocots in general, 38 indi- 
cating the unique feature of R gene evolution in 
Caryophyllales. It should also be noted that more than 
50% of the 1 14 NBS-containing genes lack N- and 
C-terminal domains (Supplementary Table S1 1 ), with 
30 genes (NL) possessing only NBS-LRR domains. 

The 2 1 7 potential NBS-type R genes predicted in this 
study were assigned to 1 25 scaffolds. Of the 1 2 5 scaf- 
folds, 87 contain single NBS genes, while the otherscaf- 
foldscontain multiple NBS genes. The positionsof eight 
scaffolds containing five or more NBS genes are shown 
in Supplementary Fig. S1 5. Approximately 30% of the 
NBS-type R genes are clustered in these scaffolds, and 
the scaffold most abundant in NBS-type R genes is 
No.146, which contains 11 genes. These results 
s uggest t h at R ge nes a re u neve n ly d ist ri bu ted i n t he ca r- 
nation genome, as was reported for a wide range of 
plants. 33 " 35,37 

3.4.5. Genes involved in ethylene 
metabolism Carnation has been used as a 

model system to study the mechanism of ethylene- 
induced flower senescence. 10 Components of the 
ethylene signal transduction pathway, such as the 



ethylene receptors, Constitutive Triple Response 
(CTR), Ethylene-lnsensitive 2 (EIN2) and EIN3, regulate 
a series of senescence-related genes. 10 By searching 
the carnation genomic sequences, six putative genes 
for ethylene receptors (DcETRI, DcETR2, DcETR3, 
DcETR4, DcERS! and DcERS2), two genes for CTR, two 
genes for EIN2 and three EIN3-like genes were identi- 
fied (Supplementary Table S1 2). Phylogenetic analy- 
sis classified six ethylene receptors in carnation into 
two subfamilies; DcETRI, DcERS! and DcERS2 are 
in Subfamily 1, which contains conserved histidine 
kinase domains, and DcETR2, DcETR3 and DcETR4 are 
in Subfamily 2 (Supplementary Fig. S1 6). 

With respect to ethylene biosynthesis, three 1 -ami- 
nocyclopropane-1 -carboxylic acid (ACC) synthase 
genes (DcACSl , DcACS2 and DcACS3) and one ACC 
oxidase gene (DcACOl) have been identified. 39 Based 
on sequence similarity, we found six more genes 
for ACS (DcACSA -DcACS9) and four more genes for 
ACO (DcAC02-DcAC05; Supplementary Table S1 2). 
Notably, the putative gene products of DcACS4, 
DcACS5 and DcACS6 lack the motifs BOX6 and BOX7, 
BOX1 and BOX2, and BOX6, respectively, while those 
of other ACSs contain all seven conserved BOXs found 
in ACS isozymes in A. thaliana. 40 Since only eight out 
of 1 2 ACS genes are catalytically active in A. thaliana, 40 
it is probable that all of the ACS identified in the carna- 
tion genome do not encode active ACS for synthesizing 
ethylene. 

3.4.6. Genes involved in carbohydrate metabolism 
and cell wall modification during flower 
opening Large amounts of soluble carbo- 
hydrate are required for flower opening as substrates 
for respiration and cell wall synthesis. Accumulation of 
substantial amounts of pinitol, one of the rare sugars 
potentially involved in salinity tolerance, is a unique 
aspect of sugar metabolism during flower opening in 
carnation. 41 One gene (Dca24344) showing strong 
homology to known genes encoding myo-inositol 
methyl transferase (IMT), which catalyzes the conver- 
sion of myo- inositol to pinitol, was found in the carna- 
tion genome. 

Xyloglucan endotransglucosylase/hydrolase (XTH), 
which is involved in the hydrolysis and reconstruction 
of xyloglucan in the matrix polysaccharides of the cell 
wall, 42 is believed to play an essential role in increasing 
cell wall extensibility followed by water uptake during 
the process of petal cell expansion. 43 Genome-wide 
analysis has revealed 33 and 29 genes in the XTH 
family in A. thaliana 44 and O. sativa 45 respectively. In 
the carnation genomic sequences, 32 genes putatively 
encoding XTH were detected (Supplementary Table 
S1 3 and Figs. S1 7, S1 8), 1 1 of which were reported to 
be expressed in flowers and in some vegetative 
tissues. 13,46 
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3.4.7. Genes related to floral scent Floral scent is a 
notable breeding target for carnation. 47 Methylation 
by S-adenosyl-L-methionine-dependent methyltrans- 
ferases belonging to SABATH family, 48 is an important 
catalytic process for emission of scent components 
from flowers. Methyl benzoate that is a major scent 
component of modern carnation cultivars, 47 is also 
derived from the methylation of benzoic acid. A similar- 
ity search against the carnation genomic sequences 
detected 1 1 genes in the SABATH family (DcSABATH 1 - 
1 7), which are candidate genes of benzoic acid methyl- 
transferase in carnation (Supplementary Table S1 4). 
Phylogenetic analysis of functionally characterized 
members of SABATH family showed that benzoic 
acid and salicylic acid methyltransferase form a 
monophyletic lineage irrespective of plant species 
(Supplementary Fig. S1 9). 49 In contrast, DcSABATHs, 
SABATH members in carnation, did not strongly associ- 
ate with this lineage. 



3.5. Database and data retrieval 

All of the information about the assembled scaffold 
sequences, known and novel repetitive sequences, 
genes for non-coding RNA (tRNA, rRNA, snoRNA and 
miRNA) and potential protein-encoding genes is avail- 
able through the Carnation DB (http://carnation. 
kazusa.or.jp). 

All the sequence data obtained in this study is avail- 
able under the BioProject ID, PRJDB1 491 . The acces- 
sion number of the reads sequenced by lllumina 
HiSeq 1 000 are as follows: PE (insert size = 500 bp): 
DRX01 2625, OF (insert size = 1 80 bp): DRX01 2624, 
MP (insert size = 3 kb): DRX012626, MP (insert 
size = 5 kb): DRX01 262 7. The accession numbers of 
the reads sequenced by Roche GS FLX+ are as follows: 
SE: DRX01 2628, PE (insert size = 4 kb): DRX01 2629. 
The nucleotide sequences of assembled scaffolds 
can be retrieved under the accession numbers 
DF340864-DF35721 3 (1 6 350 entries). 

Supplementary Data: Supplementary Data are 
available at www.dnaresearch.oxfordjournals.org. 
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